Training Deeper Models by GPU Memory Optimization on TensorFlow

نویسندگان

Chen Meng

Minmin Sun

Jun Yang

Minghui Qiu

Yang Gu

چکیده

With the advent of big data, easy-to-get GPGPU and progresses in neural network modeling techniques, training deep learning model on GPU becomes a popular choice. However, due to the inherent complexity of deep learning models and the limited memory resources on modern GPUs, training deep models is still a nontrivial task, especially when the model size is too big for a single GPU. In this paper, we propose a general dataflow-graph based GPU memory optimization strategy, i.e.,“swap-out/in”, to utilize host memory as a bigger memory pool to overcome the limitation of GPU memory. Meanwhile, to optimize the memory-consuming sequence-to-sequence (Seq2Seq) models, dedicated optimization strategies are also proposed. These strategies are integrated into TensorFlow seamlessly without accuracy loss. In the extensive experiments, significant memory usage reductions are observed. The max training batch size can be increased by 2 to 30 times given a fixed model and system configuration.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Horovod: fast and easy distributed deep learning in TensorFlow

Training modern deep learning models requires large amounts of computation, often provided by GPUs. Scaling computation from one GPU to many can enable much faster training and research progress but entails two complications. First, the training library must support inter-GPU communication. Depending on the particular methods employed, this communication may entail anywhere from negligible to s...

متن کامل

Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs

Deep learning frameworks have been widely deployed on GPU servers for deep learning applications in both academia and industry. In the training of deep neural networks (DNNs), there are many standard processes or algorithms, such as convolution and stochastic gradient descent (SGD), but the running performance of different frameworks might be different even running the same deep model on the sa...

متن کامل

TBD: Benchmarking and Analyzing Deep Neural Network Training

The recent popularity of deep neural networks (DNNs) has generated a lot of research interest in performing DNN-related computation efficiently. However, the primary focus is usually very narrow and limited to (i) inference – i.e. how to efficiently execute already trained models and (ii) image classification networks as the primary benchmark for evaluation. Our primary goal in this work is to ...

متن کامل

Stochastic Gradient Descent on Highly-Parallel Architectures

There is an increased interest in building data analytics frameworks with advanced algebraic capabilities both in industry and academia. Many of these frameworks, e.g., TensorFlow and BIDMach, implement their computeintensive primitives in two flavors—as multi-thread routines for multi-core CPUs and as highly-parallel kernels executed on GPU. Stochastic gradient descent (SGD) is the most popula...

متن کامل

Parallelization of Rich Models for Steganalysis of Digital Images using a CUDA-based Approach

There are several different methods to make an efficient strategy for steganalysis of digital images. A very powerful method in this area is rich model consisting of a large number of diverse sub-models in both spatial and transform domain that should be utilized. However, the extraction of a various types of features from an image is so time consuming in some steps, especially for training pha...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Training Deeper Models by GPU Memory Optimization on TensorFlow

نویسندگان

چکیده

منابع مشابه

Horovod: fast and easy distributed deep learning in TensorFlow

Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs

TBD: Benchmarking and Analyzing Deep Neural Network Training

Stochastic Gradient Descent on Highly-Parallel Architectures

Parallelization of Rich Models for Steganalysis of Digital Images using a CUDA-based Approach

عنوان ژورنال:

اشتراک گذاری